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-ABSTRACT- 

A  software  organization  is  presented  to  provide  for 
data  definition  and  manipulation  in  a  distributed  data  base 
management  system.  With  the  mechanism  for  distributing  the 
data  base  proposed  here,  the  physical  location  of  the  data 
is  transparent  to  the  user  program.  A  Device  Media  Control 
Language  is  specified  for  the  assignment  of  control  of  and 
access  to  a  data  base  area  to  a  set  of  processors. 
Procedures  for  reassignment  of  the  control  and  access 
functions  as  well  as  the  transfer  of  data  between  processors 
are  provided.  The  basic  hardware  and  software  requirements 
tor  a  computer  network  capable  of  supporting  a  distributed 
data  base  management  system  are  discussed  along  with  a 
specification  of  the  software  required  for  a  processor  in  a 
distributed  data  base  network. 
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Abstract 


A  software  organization  is  presented  to  provide  for  data  definition 
and  manipulation  in  a  distributed  data  base  management  system.  With  the 
mechanism  for  distributing  the  data  base  proposed  here,  the  physical 
location  of  the  data  is  transparent  to  the  user  program.  A  Device 
Media  Control  Language  is  specified  for  the  assignment  of  control  of 
and  access  to  a  data  base  area  to  a  set  of  processors.  Procedures  for 
reassignment  of  the  control  and  access  functions  as  well  as  the  transfer 
of  data  between  processors  are  provided.  The  basic  hardware  and  soft¬ 
ware  requirements  for  a  computer  network  capable  of  supporting  a  distributed 
data  base  management  system  are  discussed  along  with  a  specification  of 
the  software  required  for  a  processor  in  a  distributed  data  base  network. 


I.  Introduction 


This  paper  presents  a  software  organization  for  a  distributed  data 
base  management  system  (DDBMS) .  A  DDBMS  is  a  data  base  management  system 
that  resides  on  a  network  of  computers.  The  processors  in  the  network  may 
perform  any  combination  of  the  three  following  functions. 

a.  Front-end  =  act  as  user  interface,  receive  input,  transmit  output. 

b.  Host  =  execute  the  application  program. 

c.  Back-end  =  control  data  base  access  through  execution  of  data 
base  system  software. 

The  DDBMS  software  structure  presented  in  this  paper  reflects  the 
CODASYL-type  data  base  systems  fl,2].  The  basic  software  distribution  and 
several  possible  hardware  configurations  for  DDBMS  systems  are  discussed 
in  Reference  [3].  The  emphasis  of  this  paper  is  to  specify  the  software 
functions  that  are  required  in  order  to  provide  for  proper  data  definition 
and  manipulation  in  a  DDBMS. 

One  of  the  principle  tenets  of  the  proposed  DDBMS  organization  is  that 
• '  physical  distribution  of  the  data  be  transparent  to  the  user.  This 
implies  that  at  the  application  program  level,  both  the  program  and  the 
r  are  unconcerned  with  the  precise  physical  location  of  the  data  or  of 
the  processor  that  is  accessing  the  data.  In  addition,  the  DDBMS  must  have 
i  .  capability  of  moving  data  among  secondary  storage  devices  and  DBMS 
functions  among  processors.  However,  the  user  may  stipulate  that  units 
if  data  be  physically  close.  Additionally,  it  is  necessary  for  the  system 
.-.oft ware  to  be  portable  in  order  for  an  application  program  to  execute  on 
n.'  host  machine  and  access  data  through  any  back-end  processor.  This 


ability,  to  request  relocating  of  tasks  and  data,  is  partially  within  the 


2 


DDBMS,  but  as  with  all  application  software,  allocation  of  resources  is 
dependent  on  the  network  operating  system  (NOS)  upon  which  the  DDBMS  is 
constructed  (in  this  paper  it  is  supported  by  an  NOS  subsystem  called 
Network  Resource  Control  (NRC)).  Information  on  the  characteristics  of 
a  network  operating  system  capable  of  supporting  a  DDBMS  can  be  obtained 
in  Reference  [4]. 

The  remaining  sections  of  this  paper  contain  a  set  of  proposed  mech¬ 
anisms  for  defining  and  utilizing  data  in  a  DDBMS  which  has  the  capabilities 
described  in  the  preceding  paragraphs. 
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IX.  Data  Definition 

In  a  CODASYL-type  DBMS  the  description  of  the  date  is  carried  out 
in  three  steps.  Initially,  the  schema  Data  Description  Language  (DDL)  is 
used  to  describe  the  logical  organization  and  format  of  the  data  base. 

That  portion  of  the  data  base  accessible  to  a  particular  program  is  defined 
by  means  of  tjhe  sub-schema  DDL.  The  schema  and  sub-schema  are  both  logical 
descriptions  of  the  data.  The  logical  to  physical  mapping  is  accomplished 
through  use  of  the  Device  Media  Control  Language  (DMCL).  It  is  important  to 
note  that  the  CODASYL  committee  considered  the  DMCL  as  an  implementation- 
dependent  feature  of  a  DBMS  and  consequently,  has  not  specified  a  DMCL.  It 
is  through  the  DMCL  that  the  distribution  of  the  data  base  is  accomplished. 

In  the  definition  of  a  distributed  data  base  (as  in  the  case  of  a 
central  system)  the  existence  or  lack  of  physical  proximity  of  records  Is 
determined  by  their  placement  in  areas.  Areas  serve  as  the  atomic  data  unit 
in  terms  of  the  distribution  of  the  data  base.  The  DMCL  is  used  to  associate 
areas  with  both  logical  back-end  processors  and  secondary  storage  devices. 
This  Information  is  compiled  from  the  DMCL  and  stored  in  the  Area  Logical 
Location  (ALL)  Table.  The  system  may  transport  areas  between  physical 
devices.  Such  actions  would  remain  transparent  to  the  DBMS  application 
programs  provided  the  ALL  tables  and  the  tables  of  concents  of  the  actual 
disk  cartridges  indicating,  respectively,  the  physical  location  of  the  area 
and  the  concents  of  the  cartridge  afe  properly  updated. 

By  restricting  the  effects  of  transparency  to  the  description  of  the 
areas  in  the  DMCL,  both  the  schema  and  sub-schema  DDL's  remain  unchanged  in 
a  DDBMS.  The  syntax  of  a  DMCL  for  a  distributed  CODASYL-type  data  base 
system  is  shown  in  Figure  1. 

The  FACE  NUMBER  and  SIZE  sentences  in  the  Area  Section  are  used  to 


Schema  Section 


DMCL  FOR  SCHEMA  schemaname. 

Subschema  Section 

SUBSCHEMA  IS  subschemaname, 

Area  Section 

AREA  IS  areaname. 

NUMBER  OF  PAGES  IS  integerI. 
PAGE  SIZE  IS  integer2. 
BACKEND  IS  processorname. 
DEVICE  IS  devicename. 

(TYPE  IS  IDENTI.FI.ER.) 
HOSTS  ARE  (Processornamee). 


Figure  1 
DMCL  Syntax 


describe  Che  size  of  the  area.  The  SLACKEN'D  sentence  provides  a  logical 
name  for  the  processor  which  controls  access  to  the  area.  The  DEVICE 
sentence  provides  a  logical  name  and  physical  type  for  the  device  upon 
which  the  area  is  to  be  stored.  A  physical  type  can  include  disk,  tape, 
cassette, .floppy  disk,  or  any  other  mass  storage  medium.  The  physical 
device  type  name  can  be  dependent  upon  the  back-end  processor  or  (preferrably) 
follow  a  standard  network  convention.  The  HOSTS  sentence  logically  names 
processors  which  may  contain  application  programs  that  access  the  data  in 
the  area  being  described. 

The  HOSTS  sentence  provides  an  additional  measure  of  security  to  the 
DDBMS.  The  data  base  administrator  can  specify  if  the  area  may  be  accessed 
globally  or  restricted  to  a  certain  application  system  .  It  is  Important 
to  note  that  the  HOSTS  are  identified  logically  in  the  DMCL.  The  HOSTS 
sentences  are  specifications  as  to  which  application  function  may  access 
the  data  in  an  area.  This  application  function  may  reside  on  several 
distinct  processors  and  may  be  moved  among  processors  by  the  network  oper¬ 
ating  system.  However,  unless  the  data  base  administrator  assigns  a  logical 
function  name  that  appears  in  the  HOSTS  sentence  of  an  area  to  a  physical 
processor,  no  program  executing  on  that  processor  may  access  data  in  that  area. 

The  product  of  a  DMCL  compilation  is  an  ALL  Table  whose  entry  format 
is  depicted  in  Figure  2.  An  ALL  Table  is  maintained  for  each  schema.  Within 
the  table  the  information  is  grouped  by  area.  The  area  information  is  obtained 
directly  from  the  DMCL  code. 

It  should  be  noted  that  processors  and  devices  are  identified  logically 
In  the  ALL  Table..  This  allows  areas  and  tasks  to  change  their  physical 
location  in  a  manner  transparent  to  the  DBMS.  The  logical-to-physical  mapping 
for  processors  is  maintained  in  the  network  operating  system  nucleus  on  each 
machine.  The  companion  mapping  for  devices  is  maintained  locally  at  each 


ALL  Table  Entry  Format 


network  node.  Figure  3  illustrates  the  ALL  Table  and  logical-to-physical 
map  for  a  sample  distributed  data  base. 

III.  Data  Manipulation  in  Distributed  Data  Access 

The  mechanisms  for  data  manipulation  in  a  DDBXS  can  be  presented  by 
considering  the  steps  that  are  required  for  a  user  program  to  access  data 
on  a  device  connected  to  another  processor. 

Before  an  application  program  can  access  data  in  an  area,  it  must  first 
issue  a  READY  statement  [1]  for  that  area.  The  data  base  system  obtains  the 
logical  back-end  processor  and  device  name  from  the  ALL  Table  entry  of  the 
area  named  in  the  READY  statement.  The  physical  processor  name  is  obtained 
from  Logical-to-Physical  Processor  Map  maintained  by  the  network  operating 
system.  A  message  is  then  sent  to  the  back-end  processor  requesting  that 
the  area  be  made  available  to  the  application  program  and  that  a  task  be 
created  for  that  sub-schema  (if  one  does  not  already  exist).  The  general 
format  of  the  messages  sent  in  the  DDBMS  is  given  in  Figure  4.  If  the  device 
containing  the  area  is  online  at  the  back-end  processor  and  all  integrity 
and  security  requirements  are  satisfied,  the  back-end  transmits  a  message  to 
the  host  processor  and  the  application  program  may  begin  to  utilize  the  data 
in  that  area. 

There  are  several  situations  which  can  hinder  the  completion  of  the 
READY  statement.  First,  we  treat  the  problem  of  the  device(s)  containing 
the  area  not  being  online.  In  this  case,  a  message  is  transmitted  to  the 
operator  of  the  back-end  processor,  requesting  mounting  of  the  proper  disk 
pack.  The  pack  is  Identified  by  the  name  obtained  from  the  ALL  Table.  When 
the  pack  Is  placed  online,  the  data  base  system  automatically  verifies  its 
name  against  that  requested  by  the  application  to  detect  any  possible  oper¬ 
ator  errors.  Each  data  base  contains  a  header  which  indicates  the  logical 


DATA  BASE  (P„  -  Processor,  A, 
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ALL  TABLE  (partial) 


BACKEND 
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Figure  3  Sample' Distributed  Data  Base 
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SEND  MESSAGE  (TOJD,  MESSAGE,  SIZE,  EVTJD) 

WHERE 

TO_ID  IS  THE  SYMBOLIC  NAME  OF  THE  TASK  WHICH 
IS  TO  RECEIVE  THE  MESSAGE. 

MESSAGE  SPECIFIES  THE  BEGINNING  address  OF  THE 
MESSAGE  TO  BE  SENT. 

SIZE  SPECIFIES  THE  SIZE  OF  THE  MESSAGE  TO  BE 

SENT. 

EVTJD  IS  AN  EVENT  UNIQUELY  IDENTIFIED  WITH  THIS 
MESSAGE. 


RECEIVE  MESSAGE  (FROMJD,  MESSAGE,  EVTJD,  SIZE) 

WHERE 

FROMJD  IDENTIFIES  THE  NAME  OF  A  TASK  FROM  WHICH 
A  MESSAGE  IS  TO  BE  RECEIVED. 

EVTJD,  SIZE,  AND  MESSAGE  are  as  before. 

WAIT  (EVTJD) 

WHERE 

EVTJD  IS  AN  EVENT  WHICH  IS  SET  TO  "HAPPENED" 
WHEN  ITS  ASSOCIATED  OPERATION  IS  COMPLETE.  It  ALLOWS 
PROCESSES  TO  SUSPEND  AWAITING  PARTICULAR  OPERATION. 

SEND.COMMANDJTOJD,  COMMAND,  EVTJD) 

ACCEPT_COMMAND  (FROMJD,  COMMAND,  EVTJD) 

WHERE  COMMAND  IS  A  FIXED  SIZE  (SMALL)  MESSAGE 

CONNECT  (TOJD,  COMMAND) 

DISCONNECT  (FROMJD,  TYPE) 

WHERE 

f IMMEDIATE,  about  all  active  messages 
TYPE  -  [quI£SCE,  allow  active  messages  to  be 

SENT  BUT  NO  MORE  INITIATED 


Figure  4 


Conceptual  Message  Formats 
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(back-end)  processors  capable  of  accessing  Its  data.  This  feature  la  included 
for  security  purposes.  If  the  processor  names  correspond,  the  back-end  pro¬ 
cessor  will  notify  the  host  processor  that  the  area  is  available  for  data 
manipulation. 

Another  problem  occurs  when  there  is  no  entry  in  the  Logical-to-Physical 
Processor  Hap  for  the  back-end  processor  name  obtained  from  the  ALL  Table. 

In  this  case  the  host  processor  must  obtain  from  the  Data  Dictionary  a 
list'  of  processors  that  have  the  potential  of  operating  under  that  logical 
back-end  name.  Messages  requesting. that  a  back-end  processor  assume  the 
functions  associated  with  that  logical  name  are  then  transmitted  to  the  proces¬ 
sors  In  that  list.  Once  a  back-end  is  identified,  a  request  is  made  to 
mount  the  appropriate  device  for  the  area  being  READYed.  If  no  machine  can 
assume  the  requested  logical  processor  role,  the  application  program  on  the 
host  processor  is  terminated. 

Vhen  a  processor  assumes  a  back-end  function,  a  task  must  be  created 
in  that  processor  to  handle  the  data  base  requests  for  that  area.  All  other 
processors  in  the  network  then  are  notified  of  the  new  logical  name  for  the 
processor. 

Once  the  area  has  been  READYed,  the  user  program  on  the  host  machine 
may  issue  DHL  commands  to  reference  the  data.  'The  area  name  for  any 
record  to  be  accessed  by  the  DHL  commands  is  available  in  the  sub-schema 
which  is  attached  to  the  application  process  during  process  initiation. 

Using  the  area  name,  the  physical  location  of  the  record  in  the  network  can 
'be  determined  from  the  ALL  Table,  the  host  machine  then  transmits  a  message 
to  the  appropriate  back-end  processor  which  performs  the  DBMS  operation  and 
transmits  the  results  back  to  the  host  processor  via  the  message  system. 

In  a  CODASYL-type  DBMS,  it  is  possible  that  an  operation  on  a  record  in 
one  area  may  result  in  the  need  for  operations  on  records  in  other  areas 


(for  example,  the  removal  of  an  owner  record  and  thus  all  its  members  from 
the  data  base).  In  such  situations,  the  back-end  processor  controlling  the 
operation  determines  the  back-end  processor  name  for  the  effected  area 
from  the  ALL  Table.  If  the  processor  names  are  different,  a  message 
Indicating  che  necessary  data  base  action  must  be  sent  to  the  back-end 
processor  of  the  area.  This  procedure  could  reoccur  several  times  before 
.'ompletion  of  the  original  DBMS  operation,  depending  upon  the  complexity 
and  distribution  of  the  data  base.  When  the  original  back-end  processor 
has  received  completion  messages  froin  all  of  the  secondary  back-end 
machines,  it  then  transmits  a  message  with  the  appropriate  data  and  status 
information  to  the  host  computer. 

IV.  Task  and  Data  Movement 
A.  Conceptual  Aspects 

In  a  DDBMS,  it  may  be  desirable  for  reasons  of  efficiency  or  security 
to  change  the  physical  location  of  a  data  base  management  task  or  data 
area.  Movement  of  data  can  occur  either  logically  by  a  programmed  (file) 
transfer  of  an  area  between  storage  devices  or  physically  by  an  operator 
moving  a  storage  device  from  one  computer  to  another.  Tasks  are  moved  in 
a  DDBMS  by  making  use  of  features  provided  in  a  network  operating  system. 

The  case  of  logical  data  movement  will  be  cohsidered  first.  When  an 
area  is  moved  between  devices  controlled  by  the  same  back-end  processor,  the 
device  name  must  be  modified  in  the  ALL  Table  of  that  back-end  processor 
and  the  header  record  on  che  storage  media  must  be  updated.  Movement 
between  devices  attached  to  different  processors  requires  updating  the  ALL 
Tables  for  all  back-end  processors  chat  control  any  portion  of  the  sub-schemas 
containing  the  transported  areas.  It  can  be  seen  from  the  description  of  data 
manipulation  in  Section  III  that  if  che  ALL  Tables  are  properly  updated,  there 
will  be  no  effect  on  cither  the  user  program  on  che  host  or  the  data  base 
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system  on  the  back-end  machine. 

The  logical  movement  of  an  area  la  accompllahed  via  network  operating 
system  utility  programs,  U#  and  Uf,  executing  on  B#,  the  sending  back-end 
processor,  and  Bf,  the  receiving  back-end  processor,  respectively.  The 
following  procedure  Is  executed  to  move  area  A  from  B(  to  B^. 

Procedure  1  (Logical  Area  Movement) 

Let  SSTg  be  the  set  of  the  sub-schema  tasks  for  all  sub-schemas  containing 
area  A. 

1.  U  notifies  all  tasks  In  SST  on  B  that  A  will  be  moved. 

s  as 

2.  All  tasks  In  SST^  will  notify  the  message  utility  not  tc  accept  any 
further  message  for  DHL's  in  area  A. 

3.  The  message  system  on  B  instructs  the  message  utility  on  all  hosts 

8 

accessing  this  area  that  any  DML  messages  referring  to  A  are  held  in  the 
host  machine  by  the  message  utility. 

4.  Each  task  in  SST  will  notify  13  when  all  DML’s  for  A  are  complete. 

®  8 

3.  sends  a  message  to  Indicating  that  A  is  ready  to  be  moved. 

6-  writes  A  onto  secondary  storage  thus  plscing  the  latest  version 
of  A  onto  disk. 

7.  A  is  transferred  from  secondary  storage  of  B^  to  secondary  storage 

of  B  . 
r 

8.  If  for  any  sub-schema  A  was  the  only  area  controlled  by  B  ,  then  0 

s  s 

must  remove  the  entries  for  that  sub-schema  from  the  ALL  Table  maintained  by  B  . 

s 

9.  If  for  any  sub-schema  cask  SST  A  was  the  only  area  controlled  by  B  , 

8  S 

then  the  task  for  that  sub-schema  must  be  destroyed  on  B#. 

10.  If  A  Is  the  only  area  for  a  given  sub-schema  on  Bf,  then  must 
update  the  ALL  Tables  of  Br  to  Include  the  entries  for  that  sub-schema. 

11.  Any  sub-schema  tasks  on  B#  destroyed  by  the  movement  of  A  must  be 
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created  (If  not  already  present)  on  B  . 

12.  If  by  receiving  A,  Is  now  assuming  a  new  logical  back-end  function, 
then  the  L-P-P  Map  on  oust  be  updated. 

13.  If  by  sending  A  to  Bs»  Bf  no  longer  retains  that  particular  back-end 
function,  then  the  L-P-P  Map  on  B#  must  be  updated. 

14.  Update  all  L-P-P  Maps  in  the  network  if  necessary. 

15.  The  sub-schema  tasks  In  SST  on  B  obtain  the  list  of  hosts  for  A 

a  r 

from  the  ALL  Table. 

16.  The  sub-schema  tasks  in  SST^ .notify  the  message  utilities  which  are 
queueing  requests  from  their  corresponding  host  tasks  that  they  will  now 
accept  DML  messages  for  A. 

Figure  5  gives  a  precedence  graph  for  Algorithm  1.  The  interaction 
of  the  utility  tasks  and  system  software  resulting  from  the  execution  of 
Procedure  1  is  illustrated  in  Figure  6.  Only  the  software  directly  involved 
in  logical  area  movement  is  pictured  in  Figure  6. 

The  situation  in  which  a  physical  device  is  moved  is  more  complex 
than  logical  movement.  The  most  difficult  situation  occurs  if  the  pack 
(or  any  other  type  of  storage  device)  is  active  on  a  back-end  processor  and 
the  operator  indicates  to  the  system  the  desire  to  remove  that  pack  from 
online  status.  The  data  base  management  software’on  the  back-end  processor 
must  complete  all  requests  to  areas  on  that  pack  and  inform  the  effected 
hosts  to  hold  all  new  requests  for  those  areas.  The  operator  may  then 
disengage  the  pack  and  remount  it  on  a  new  machine.  When  the  pack  is 
mounted  on  the  new  machine,  the  procedure  outlined  in  Section  III  for  bringing 
up  a  new  pack  in  the  ODBMS  can  be  followed.  It  is  important  to  note  that  in 
order  for  a  processor  to  be  eligible  to  receive  the  transferred  pack,  it  must 
be  read  and  write  compatible  with  the  source  processor  as  well  as  have  the 
necessary  logical  processor  function  assigned  to  it. 
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Logical  Area  Movement 
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In  order  to  allow  the  DDBMS  to  stop  processing  a  pack  prior  to  Its 
moveraent,  the  list  of  active  areas  on  the  pack  Is  determined  from  the  Device 
Header  List  that  Is  maintained  by  the  back-end  processor.  For  each  area, 
the  list  of  logical  host  names  Is  obtained  from  the  ALL  Table.  The  back¬ 
end  processor  then  accesses  its  Loglcal-to-Physical  Processor  Map  to  deter¬ 
mine  the  host  computers  which  must  be  sent  the  messages  indicating  an  area 
transfer. 

The  back-end  data  base  software  which  controls  the  mounting  of  a  new 
pack  must  be  cognizant  of  requirements  stating  that  In  order  for  a  given 
pack  to  be  accessed  some  other  pack  (or  packs)  must  also  be  attached  to  the 
same  processor.  Such  constraints  could  occur  in  the  cases  of  multi-device 
areas  or  for  security  or  efficiency  reasons. 

The  physical  movement  of  a  data  base  pack  between  back-end  processors 
results  from  carrying  out  the  steps  of  Procedure  2.  A  pack  shutdown  utility 

U.  is  first  executed  on  the  sending  back-end  processor  B  .  A  pack  mount 

a  8 

utility, U  ,  Is  then  executed  on  B  ,  the  receiving  back-end  processor. 

a  r 

Procedure  2  (Physical  Pack  Movement) 

1.  The  B  operator  activates  pack  shutdown  utility,  U. . 

s  d 

2.  0.  Initiates  steps  1  -  4, 6, 8-9,  and  13  of  Procedure  1  for  each  area 

a 

on  the  pack. 

3.  U.  notifies  the  B  operator  that  the  pack  may  be  moved. 

a  s 

4.  The  Bs  operator  removes  the  pack. 

5.  Bf  operator  mounts  the  pack. 

6.  B  operator  activates  pack  mount  utility,  U  . 

r  a 

7.  U  Initiates  steps  10-12  and  14-16  of  Procedure  1  for  each  area  on  the  pack. 

a 

There  are  two  types  of  task  movement  In  a  DDBMS;  the  transferring  of  an 
application  program  between  processors,  and  the  interprocessor  movement  of 
back-end  software.  Both  cases  can  be  considered  as  processor  function 


reassignment. 

In  the  case  of  host  function  movement  the  only  action  requited  is  a 
change  in  the  Logical-to-Physlcal  Processor  Map  in  the  network  operating 
system.  The  redefinition  of  the  ability  for  appropriate  processors  to 
access  an  area  can  also  be  accomplished  by  executing  the  DMCL  compiler  with 
a  modified  HOSTS  paragraph. 

The  back-end  functions  may  only  be  transferred  if  the  receiving  pro¬ 
cessor  is  linked  to  the  storage  device(s)  containing  the  area.  The 
mechanisms  for  accomplishing  the  transfer  are  identical  to  the  host 
situation.  Either  the  network  operating  system  Logical-to-Physical  Pro¬ 
cessor  Map  or  the  DMCL  BACK-END  sentence  can  be  amended. 

Thus  as  in  the  case  of  data  movement,  tasks  (processor  functions) 
can  be  moved  in  a  DDBMS  with  minimal  overhead  and  with  no  alteration 
required  to  user  or  DBMS  software.'  This  statement  is  predicated  upon  the 
portability  of  the  data  base  system  software.  Given  the  state  of  the 
industry,  movement  must  be  restricted  to  homogeneous  machines  for  the 
present. 

One  important  fact  concerning  the  relationship  between  logical  and 
physical  names  for  physical  entitles  (processors,  devices)  is  that  the 
mapping  can  be  one-to-one,  many-to-many,  one-to-many,  or  many-to-one.  A 
one-to-many  mapping  indicates  a  multi-processor  configuration  or  an  area 
spread  across  several  devices.  A  processor  or  storage  device  can  be 
identified  by  several  logical  names,  thus  producing  the  many-to-one  rela¬ 
tionship.  The  many-to-many  mapping  is  a  merger  of  the  two  previously  men¬ 
tioned  situations.  The  flexibility  of  the  logical  to  physical  mapping 
provides  the  data  base  administrator  with  considerable  latitude  in  the 
distribution  of  the  data  base. 
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Figure  7  depicts  a  one-to-many  device  mapping.  The  area  NAMES  Is  spread 
over  three  physical  devices.  The  record  occurrences,  FRED,  VIRG,  and  PAUL, 
all  reside  on  separate  disk  packs  controlled  by  the  same  back-end  processor. 

The  multi-device  area  concept  Is  found  in  many  commercially  available  data 
base  management  systems. 

A  one-to-many  processor  mapping  implies  the  existence  of  a  multi-processor 
back-end.  A  multi-processor  back-end  consists  of  several  processors  joined 
via  a  raemory-to-memory  connection.  Each  processor  has  access  to  the  areas 
controlled  by  the  back-end  function  shared  by  the  processors^  If  a  processor 
in  a  multi-processor  back-end  configuration  does  not  have  a  direct  physical 
connection,  it  requests  that  a  processor  having  such  a  link  perform  the  I/O 
transfers.  With  a  shared  memory  interprocessor  connection,  such  requests 
are  performed  at  machine  memory  speeds.  Figure  8  illustrates  a  multi¬ 
processor  back-end  configuration.  The  concept  of  multi-processor  back-end 
machines  is  discussed  by  Lowenthal  in  Reference  [5].  Figure  9  portrays  a 
many-to-many  mapping  which  is  realized  by  combining  the  configurations 
shown  in  the  two  previous  figures. 

For  a  given  distributed  data  base  system,  the  range  of  a  multi-processor 
back-end  configuration  is  limited  by  the  network  topology.  A  generalized 
network  such  as  MIMICS  [A]  is  composed  of  a  collection  of  machine  clusters 
joined  together.  Within  each  cluster  several  processor  nodes  linked  via 
high  speed  memory  connections,  (5  -*10  Megabyte/sec).  Figure  10  shows  the 
general  topology  of  the  MIMICS  network. 

The  following  rules  apply  to  multi-processor  back-ends: 

1*  The  processors  comprising  a  multi-processor  back-end  must  be 
members  of  the  same  cluster. 

2.  Areas  may  not  span  clusters.  As  shown  in  Figures  8  thru  10, it  is 


HOST 


Figure  8 


Multi -Processor  Back-End 


not  necessary  for  each  node  in  a  multi-processor  back-end  cluster  to  have 
a  communication  link  to  a  host  computer  (or  a  cluster  of  host  computers), 
the  node  with  the  communication  link  (BE1  in  Figures  8  and  9,  for  example) 
functions  as  the  master  back-end  for  that  multi-processor  back-end  configu¬ 
ration.  The  master  handles  the  communication  operations  and  parcels  out 
the  DML  commands  to  be  executed  by  the  Individual  back-end  processors. 

An  analysis  of  the  performance,  security  integrity,  and  economic 
benefits  of  the  multi-processor  back-end  concept  has  been  performed  by 
Lowenthal  [S] . 

B.  The  Inter-Process  Communication  System  (IPCS)  of  MIMICS 

Movement  of  (areas  of)  data  in  a  DDBMS  is  accomplished  via  some  inter¬ 
process  communication  (message)  utility  which  has  the  following  functions: 

1)  It  makes  the  topology  of  the  network  transparent  to  the  application 
program; 

2)  it  makes  data  distribution  transparent  to  the  application  program; 

3)  it  synchronizes  the  tasks  which  exchange  data  to  insure  no  data  is 
lost,  garbled,  or  pilfered; 

4)  it  manages  the  names  of  network  tasks; 

5)  and  finally,  it  transmits  data  and  commands  between  tasks  (appli¬ 
cation  program  and  DBMS  tasks,  for  example).  The  concepts  of  the  IPCS 
commands  available  to  a  task  level  are  provided  in  Figure  4. 

The  IPCS  of' MIMICS  is  connection-based.  That  is,  the  general  scenario 
of  IPCS  usage  by  a  task  is  as  follows: 

1.  CONNECT  (  ,  ,  - ) 

2.  Exchange  data  using  SEND  and  RECEIVE;  exchange  commands  using 
SEND_COMMAND  and  ACCEPT_COMMAND ;  and  WAIT  on  a  particular  function 
completion  when  necessary 
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3.  DISCONNECT  (  ,  ,  - ) 

The  CONNECT/DISCONNECT  functions  establish  and  destroy  data/conmand  paths 
between  tasks.  The  SEND/RECEIVE  functions  exchange  data.  The  IPCS  syn¬ 
chronizes  the  message  SEND's  from  the  source  cask  and  the  RECEIVE's  from 
the  destination  task  to  assure  proper  space  is  allocated  for  the  message. 

The  SEXD_/ACCEPT_COMMAND  functions  allow  commands  to  be  exchanged  between 
user  tasks  without  prior  data  space  allocation  necessary.  This  permits 
user*  casks  to  exchange  simple  commands  to  establish  a  protocol  for  exchang¬ 
ing  messages.  It  also  permits  priority  traffic  for  error  or  control 
information  to  be  exchanged.  This  is  the  mechanism  used  for  movement  of 
areas  in  a  transparent  manner.  Since  all  of  the  above  SEND's  and  RECEIVE's 
do  not  force  the  task  to  stop  execution  until  completion  (to  provide  over¬ 
lap  of  execution  and  data  exchange),  the  task  may  choose  to  WAIT  until  a  later 
time  on  a  particular  SEND  or  RECEIVE  (event). 

The  general  structure  of  the  IPCS  is  shown  in  Figure  11.  The  appli¬ 
cation  program  conceptually  exchanges  DHL's  and  data/status  with  the  DBMS 
task.  The  User  Envelope  transmits  these  elements  across  a  network  as 
data  and  commands.  The  User  Envelope  thus  maps  source  and  destination 
tasks  onto  source  and  destination  logical  processors  and  then  onto  physical 
processors  via  the  L-P-P  map  and  the  ALL  table.  Thus  the  System  Envelope 
has  the  capacity  to  re-route  DML's  and  data/status  via  a  new  mapping.  The 
Message  System  merely  does  the  movement  of  data  between  tasks  which  have 
established  a  connection. 

Figure  12  illustrates  the  method  by  which  the  movement  of  an  area  between 
two  back-end  machines  can  be  achieved  in. a  manner  transparent  to  an  applica¬ 
tion  program.  It  is  assumed  the  application  program  task  (APT)  and  the  DBMS 
have  an  established  connection.  The  general  procedure,  from  the  view  of  the 
message  system,  to  move  an  area  is  as  follows: 
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1)  The  User  Envelopes  exchange  che  desire  (Initiated  from  either 
end)  to  DISCONNECT  after  a  certain  sequence  of  messages  Is  complete. 

2)  Each  does  a  DISCONNECT  (  .QUIESCE).  (These  are  synchronized  by 
the  Message  System.) 

3)  The  AREA  Is  transferred  by  a  file  transfer  protocol  [6]  via  a 

connection  between  tasks  U  and  U  .  (See  Figure  6.) 

8  r 

4)  The  User  Envelopes  on  the  Host  and  Back-End  2  (BE2)  then  connect 
as  follows: 

The  Host  User  Envelope  requests  a  network  task  name  for  ^  DBMS  task 
on  BE2  from  the  Network  Resource  Controller  (KRC)  on  the  Host.  The 
NRC's  of  all  processors  are  fully  connected.  Thus  this  request  is 

made  of  the  NRC  on  BE2.  It  responds  with  the  task  name.  The  User 

Envelope  then  connects  to  the  BE2  User  Envelope  and  DML's  and  data/ 
status  flow  between  the  Host  AFT  and  the  BE2  DBMS  task. 

It  is  important  to  note  that  KRC  must  access  only  the  L-P-P  map.  It  should 

be  clear  from  Figure  12  that  the  Host  APT  does  not  participate  in  the  move¬ 

ment  or  knowledge  of  the  movement  of  the  AREA  other  than  by  observing  some 
performance  change. 

V.  Structure  of  Host  and  Back-End  Software 

A  host  computer  in  a  DDBMS  must  contain  software  to  execute  application 
programs,  to  select  che  proper  back-end  for  data  base  functions,  and  to 
communicate  with  the  back-end  processor.  In  order  to  meet  these  ends,  a 
host  must  have  a  software  organization  similar  to  that  depicted  in  Figure 
13.  The  Logical-to-Physical  Processor  Map  (L-P-P  Map)  is  simply  an  array 
of  physical  identifiers  indexed  by  logical  names.  As  mentioned  in  prior 
sections,  this  map  is  used  in  the  selection  of  the  back-end  processor. 

The  Inter-Process  Communication  System  (IPCS)  (4]  serves  as  a  generalized 
'Tocaun icat ion  system  for  the  data  base  network. 


USER  PROGRAM  K 

USER  WORKING 

SYSTEM 

DBMS 

AREA 

1 OCATIONS 

INTERFACE _ 

• 

« 

• 

USER  PROGRAM  1 

USER  WORKING 
_ AREA _ 

SUB¬ 

SCHEMA 

1 


SYSTEMS 

LOCATIONS 


DBMS 


I  DBMS 
INTERFACE 


SUB 
CHE 

M 


Area  lOGICAl  I  I  LOG  I  CAL  TO 

LOCATION  TABLES  SYSTEH  LOCATIONS  jHYSICALPROC- 


I  PCS 

(INTER-PROCESSOR  COMMUNICATION  SYSTEM) 


Figure  13 

Host  Software  Organization 


A  back-end  processor  oust  hold  the  sub-schema  and  DMCL  Tables  for 
each  area  In  Its  dooaln.  Due  to  the  possibility  of  multiple  back-end 
participation  in  DML  execution,  each  back-end  processor  must  contain 
complete  ALL  Tables  for  all  sub-schemas  containing  areas  managed  by  that 
machine.  The  back-end  oust  also  contain  the  Logical-to-Physlcal  Processor 
Map  In  order  to  determine  physical  processor  identification.  As  shown 
in  Figure  14,  the  back-end  software  oust  also  Include  a  Device  Header 
List  (D.H.  List)  which  indicates  the  contents  of  the  secondary  storage 

units.  The  Device  Header  List  contains  a  linked  list  of  areas  with  each 

% 

list  headed  by  a  device  indicator  name. 

It  is  important  to  note  that  a  single  machine  may  be  both  a  host  and 
back-end  processor.  In  that  situation,  the  software  shown  in  both  Figures 
13  and  14  must  reside  upon  that  machine.  If  a  processor  assumes  more  than 
one  logical  host  or  back-end  function,  the  DMCL  Tables  and  sub-schemas  for 
each  logical  processor  function  must  be  stored  In  the  memory  of  the  machine. 
Figure  15  illustrates  the  software  required  on  processor  P^  of  Figure  3. 

Pj  has  two  back-end  functions,  and  Bg,  and  one  host  function,  H^, 
associated  with  it. 

VI.  The  Data  Dictionary 

Each  processor  in  a  DDBMS  may  require  access  to  any  schema,  sub-schema, 
or  ALL  Table.  Copies  of  this  Information  are  maintained  in  the  data 
dictionary.  When  a  DBMS  application  task  is  loaded  onto  a  host  processor, 
its  associated  sub-schema  and  ALL  Tables  are  obtained  from  the  data  dic¬ 
tionary.  The  data  dictionary  also  holds  the  potential  Logical-to-Physlcal 
Processor  Map  which  indicates  those  processor  permitted  to  assume  a  particular 
logical  name.  This  list  is  used  whenever  an  application  program  requests  an 
area  for  which  there  is  no  active  processor  performing  that  back-end  function. 
The  general  structure  of  a  DDBMS  data  dictionary  is  illustrated  in  Figure  16. 
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The  exact  mechanism  of  implementing  the  data  dictionary  can  vary  with 
the  structure  of  the  underlying  computer  network.  The  dictionary  must  reside 
on  a  high  speed  secondary  storage  device  for  rapid  access.  The  data  dictionary 
is  used  generally  for  query  with  little  update.  Therefore,  multiple  copies 
of  the  data  dictionary  may  be  maintained  for  the  sake  of  reliability  and  more 
rapid  reference.  In  certain  environments,  it  may  be  desirable  to  partition 
the  data  dictionary  into  sub-dictionaries  which  contain  Information  on  data 
residing  in  particular  sections  of  the  network. 

t 

VII.  Conclusion 

This  paper  has  presented  a  mechanism  for  the  dlstrlbi^ion  of  a 
CODASYL-type  data  base  management  system  in  a  manner  that  is  transparent 
to  the  application  program.  The  software  structure  presented  herein 
presupposes  an  underlying  computer  network  with  the  necessary  hardware 
and  software  to  allow  Interprocessor  communication  via  a  standardized 
message  system.  The  basis  for  distribution  in  the  DDBMS  is  the  ALL  Table 
which  provides  information  on  the  location  of  each  data  base  area. 

The  mechanisms  detailed  here  provide  a  DDBMS  communication  facility 
that  is  relatively  easy  to  realize.  However,  many  of  the  problems  of 
distributed  data  systems,  as  outlined  by  Fry  and  Sibley  [7'],  still 
require  practical  solutions.  The  dilemmas  posed  by  deadlock,  backup, 
recovery,  and  security  are  extremely  complex.  Another  formidable  stumb¬ 
ling  block  for  distributed  data  base  systems  is  the  general  lack  of  port¬ 
ability  and  compatibility  within  both  the  hardware  and  software  environments. 

The  system  described  here  could  be  implemented  with  moderate  effort  on 
homogeneous  networks.  For  heterogeneous  networks,  advances  in  soft¬ 
ware  portability  and  standardized  communication  protocols  are  required. 

Progress  is  being  made  in  these  areas  although  it  Is  hampered  somewhat 
iy  the  marketing  philosophy  of  "locking  the  user  in"  to  a  vendor's 


"product  line. 
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